14 research outputs found

    Reducing cache coherence traffic with a NUMA-aware runtime approach

    Get PDF
    Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on- and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform comprising 288 cores through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data allocation techniques to make most efficient use of the added hardware. The effectiveness of this joint approach is demonstrated by speedups of 3.14× to 9.97× and coherence traffic reductions of up to 99% in comparison to NUMA-oblivious scheduling and data allocation.This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. The Mont-Blanc project receives funding from the EU’s H2020 Framework Programme (H2020/2014-2020) under grant agreement no 671697. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft

    The Mont-Blanc prototype: an alternative approach for high-performance computing systems

    Get PDF
    High-performance computing (HPC) is recognized as one of the pillars for further advance of science, industry, medicine, and education. Current HPC systems are being developed to overcome emerging challenges in order to reach Exascale level of performance,which is expected by the year 2020. The much larger embedded and mobile market allows for rapid development of IP blocks, and provides more flexibility in designing an application-specific SoC, in turn giving possibility in balancing performance, energy-efficiency and cost. In the Mont-Blanc project, we advocate for HPC systems be built from such commodity IP blocks, currently used in embedded and mobile SoCs. As a first demonstrator of such approach, we present the Mont-Blanc prototype; the first HPC system built with commodity SoCs, memories, and NICs from the embedded and mobile domain, and off-the-shelf HPC networking, storage, cooling and integration solutions. We present the system’s architecture, and evaluation including both performance and energy efficiency. Further, we compare the system’s abilities against a production level supercomputer. At the end, we discuss parallel scalability, and estimate the maximum scalability point of this approach across a set of HPC applications.Postprint (published version

    Reducing cache coherence traffic with a NUMA-aware runtime approach

    No full text
    Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on- and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform comprising 288 cores through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data allocation techniques to make most efficient use of the added hardware. The effectiveness of this joint approach is demonstrated by speedups of 3.14× to 9.97× and coherence traffic reductions of up to 99% in comparison to NUMA-oblivious scheduling and data allocation.This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. The Mont-Blanc project receives funding from the EU’s H2020 Framework Programme (H2020/2014-2020) under grant agreement no 671697. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer Reviewe

    Modulation of VEGF expression and oxidative stress response by iodine deficiency in irradiated cancerous and non-cancerous breast cells

    No full text
    Breast cancer remains a major concern and its physiopathology is influenced by iodine deficiency (ID) and radiation exposure. Since radiation and ID can separately induce oxidative stress (OS) and microvascular responses in breast, their combination could additively increase these responses. Therefore, ID was induced in MCF7 and MCF12A breast cell lines by medium change. Cells were then X-irradiated with doses of 0.05, 0.1, or 3 Gy. In MCF12A cells, both ID and radiation (0.1 and 3 Gy) increased OS and vascular endothelial growth factor (VEGF) expression, with an additive effect when the highest dose was combined with ID. However, in MCF7 cells no additive effect was observed. VEGF mRNA up-regulation was reactive oxygen species (ROS)-dependent, involving radiation-induced mitochondrial ROS. Results on total VEGF mRNA hold true for the pro-angiogenic isoform VEGF165 mRNA, but the treatments did not modulate the anti-angiogenic isoform VEGF165b. Radiation-induced antioxidant response was differentially regulated upon ID in both cell lines. Thus, radiation response is modulated according to iodine status and cell type and can lead to additive effects on ROS and VEGF. As these are often involved in cancer initiation and progression, we believe that iodine status should be taken into account in radiation prevention policies

    Mont-Blanc 2020: Towards Scalable and Power Efficient European HPC Processors

    Get PDF
    The Mont-Blanc 2020 (MB2020) project has triggered the development of the next generation industrial processor for Big Data and High Performance Computing (HPC). MB2020 is paving the way to the future low-power European processor for exascale, defining the System-on-Chip (SoC) architecture and implementing new critical building blocks to be integrated in such an SoC. In this paper, we first present an overview of the MB2020 project, then we describe our experimental infrastructure, the requirements of relevant applications, and the IP blocks developed in the project. Finally, we present our emulation-based final demonstrator and explain how it integrates within our first generation of HPC processors.This work is supported by the European Community’s Horizon 2020 Framework Programme under the Mont-Blanc 2020 project, grant agreement n. 779877.Peer ReviewedPostprint (author's final draft

    RED-SEA: Network Solution for Exascale Architectures

    No full text
    In order to enable Exascale computing, next generation interconnection networks must scale to hundreds of thousands of nodes, and must provide features to also allow the HPC, HPDA, and AI applications to reach Exascale, while benefiting from new hardware and software trends. RED-SEA will pave the way to the next generation of European Exascale interconnects, including the next generation of BXI, as follows: (i) specify the new architecture using hardware-software co-design and a set of applications representative of the new terrain of converging HPC, HPDA, and AI; (ii) test, evaluate, and/or implement the new architectural features at multiple levels, according to the nature of each of them, ranging from mathematical analysis and modeling, to simulation, or to emulation or implementation on FPGA testbeds; (iii) enable seamless communication within and between resource clusters, and therefore development of a high-performance low latency gateway, bridging seamlessly with Ethernet; (iv) add efficient network resource management, thus improving congestion resiliency, virtualization, adaptive routing, collective operations; (v) open the interconnect to new kinds of applications and hardware, with enhancements for end-to-end network services - from programming models to reliability, security, low- latency, and new processors; (vi) leverage open standards and compatible APIs to develop innovative reusable libraries and Fabrics management solutions.ISSN:1089-650

    Towards Resilient EU HPC Systems: A Blueprint

    Get PDF
    This document aims to spearhead a Europe-wide discussion on HPC system resilience and to help the European HPC community define best practices for resilience. It analyses a wide range of state-of-the-art resilience mechanisms and recommend the most effective approaches to employ in large-scale HPC systems. These guidelines will be useful in the allocation of available resources, as well as guiding researchers and research funding towards the enhancement of resilience approaches with the highest priority and utility. Although it is focused on the needs of next generation HPC systems in Europe, the principles and evaluations are applicable globally. This document is the first output of the ongoing European HPC resilience initiative and it covers individual nodes in HPC systems, encompassing CPU, memory, intra-node interconnect and emerging FPGA-based hardware accelerators. With community support and feedback on this initial document, we will update the analysis and expand the scope to include other types of accelerators, as well as networks and storage. The need for resilience features is analysed based on three guiding principles: 1. The resilience features implemented in HPC systems should assure that the failure rate of the system is below an acceptable threshold, representative of the technology, system size and target application. 2. Given the high cost incurred by uncorrected error propagation, hardware errors should be detected and corrected frequently and at low overhead, which is likely only possible in hardware. 3. Overheating is one of the main causes of unreliable device behaviour. Production HPC systems should prevent overheating while balancing power/energy and performance. Based on these principles, the main outcome of this document is that the following features should be given priority during the design, implementation and operation of any large-scale HPC system: ‱ ECC in main memory ‱ Memory demand and patrol scrub ‱ Memory address parity protection ‱ Error detection in CPU caches and registers ‱ Error detection in the intra-node interconnect ‱ Packet retry in the intra-node interconnect ‱ Reporting corrected errors to the BIOS or OS (system software requirement) ‱ Memory thermal throttling ‱ Dynamic voltage and frequency scaling for CPUs, FPGAs and ASICs ‱ Over-temperature shutdown mechanism for FPGAs ‱ ECC in FPGA on-chip data memories as well as in configuration memories The remaining state-of-the-art resilience features surveyed in this document should only be developed and implemented after a more detailed and specific cost–benefit analysis

    The Mont-Blanc prototype: An Alternative Approach for HPC Systems

    No full text
    International audienceHigh-performance computing (HPC) is recognizedas one of the pillars for further progress in science, industry,medicine, and education. Current HPC systems are being developed to overcome emerging architectural challenges in orderto reach Exascale level of performance, projected for the year2020. The much larger embedded and mobile market allowsfor rapid development of intellectual property (IP) blocksand provides more flexibility in designing an application-specific system-on-chip (SoC), in turn providing the possibilityin balancing performance, energy-efficiency, and cost. In theMont-Blanc project, we advocate for HPC systems being builtfrom such commodity IP blocks, currently used in embeddedand mobile SoCs.As a first demonstrator of such an approach, we presentthe Mont-Blanc prototype; the first HPC system built withcommodity SoCs, memories, and network interface cards(NICs) from the embedded and mobile domain, and off-the-shelf HPC networking, storage, cooling, and integrationsolutions. We present the system’s architecture and evaluateboth performance and energy efficiency. Further, we comparethe system’s abilities against a production level supercomputer.At the end, we discuss parallel scalability and estimate themaximum scalability point of this approach across a set ofapplications

    The Mont-Blanc Prototype: An Alternative Approach for HPC Systems

    No full text
    High-performance computing (HPC) is recognized as one of the pillars for further progress in science, industry, medicine, and education. Current HPC systems are being developed to overcome emerging architectural challenges in order to reach Exascale level of performance, projected for the year 2020. The much larger embedded and mobile market allows for rapid development of intellectual property (IP) blocks and provides more flexibility in designing an application-specific system-on-chip (SoC), in turn providing the possibility in balancing performance, energy-efficiency, and cost. In the Mont-Blanc project, we advocate for HPC systems being built from such commodity IP blocks, currently used in embedded and mobile SoCs.As a first demonstrator of such an approach, we present the Mont-Blanc prototype; the first HPC system built with commodity SoCs, memories, and network interface cards (NICs) from the embedded and mobile domain, and off-the-shelf HPC networking, storage, cooling, and integration solutions. We present the system's architecture and evaluate both performance and energy efficiency. Further, we compare the system's abilities against a production level supercomputer. At the end, we discuss parallel scalability and estimate the maximum scalability point of this approach across a set of applications
    corecore